logo

1. Introduction

In this project, I explore the application of Meta’s Prophet forecasting system to analyze time series data. Prophet is an open-source forecasting tool developed by Meta that is designed to handle time series with strong seasonal effects and multiple seasons of historical data.

1.1 Meta’s Prophet

Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, including holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data.

The model equation in Prophet can be represented as:

\[y(t) = g(t) + s(t) + h(t) + \epsilon_t\]

Where: - \(g(t)\) is the trend component - \(s(t)\) is the seasonality component - \(h(t)\) is the holiday effect - \(\epsilon_t\) is the error term

1.2 Dataset: Atmospheric CO2 Concentrations

For this analysis, I will be using the CO2 dataset available in R, which contains atmospheric carbon dioxide measurements from the Mauna Loa Observatory in Hawaii from 1959 to 1997. This dataset is particularly interesting because:

  • It shows both long-term trends (increasing CO2 levels due to human activities)
  • It displays clear seasonal patterns (annual cycles)

Let’s first load the necessary libraries and examine the data:

# Load libraries
library(prophet)
library(zoo)
library(dplyr)
library(ggplot2)

2. Data Exploration

2.1 Understanding the CO2 Dataset

Let’s first examine the CO2 dataset structure and visualize the raw data:

# Load and examine the CO2 dataset
data(co2)
class(co2)
## [1] "ts"
str(co2)
##  Time-Series [1:468] from 1959 to 1998: 315 316 316 318 318 ...
start(co2)
## [1] 1959    1
end(co2)
## [1] 1997   12
frequency(co2)
## [1] 12
# Basic summary statistics
summary(co2)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   313.2   323.5   335.2   337.1   350.3   366.8
# Plot the raw time series
plot(co2, main="Monthly Atmospheric CO2 Concentration", 
     ylab="CO2 (ppm)", xlab="Year")

2.2 Basic Time Series Analysis

Before applying Prophet, basic analysis is performed to understand the characteristics of the CO2 time series:

# Calculate growth rate
annual_growth <- 100 * mean(diff(co2, lag=12), na.rm=TRUE) / mean(co2)
cat("Average annual growth rate:", round(annual_growth, 2), "%\n")
## Average annual growth rate: 0.37 %
# Decompose the time series to see trend, seasonal, and random components
co2_decomp <- decompose(co2)
plot(co2_decomp)

# Linear trend analysis
time_index <- 1:length(co2)
linear_model <- lm(co2 ~ time_index)
summary(linear_model)
## 
## Call:
## lm(formula = co2 ~ time_index)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.0399 -1.9476 -0.0017  1.9113  6.5149 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.115e+02  2.424e-01  1284.9   <2e-16 ***
## time_index  1.090e-01  8.958e-04   121.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.618 on 466 degrees of freedom
## Multiple R-squared:  0.9695, Adjusted R-squared:  0.9694 
## F-statistic: 1.479e+04 on 1 and 466 DF,  p-value: < 2.2e-16
# Visualize with trend line
plot(co2, main="CO2 with Linear Trend", ylab="CO2 (ppm)", xlab="Year")
abline(linear_model, col="red")

3. Prophet Implementation

3.1 Data Preparation for Prophet

Prophet requires a specific format for input data - a dataframe with columns ‘ds’ (date) and ‘y’ (value):

# Convert ts object to dataframe for Prophet
co2_df <- data.frame(
  ds = zoo::as.yearmon(time(co2)),
  y = as.numeric(co2)
)

# Display the first few rows
head(co2_df)
##         ds      y
## 1 Jan 1959 315.42
## 2 Feb 1959 316.31
## 3 Mar 1959 316.50
## 4 Apr 1959 317.56
## 5 May 1959 318.13
## 6 Jun 1959 318.00

3.2 Basic Prophet Model

Now let’s implement Prophet:

# Create and fit the basic Prophet model
model <- prophet(co2_df)
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
# Create future dataframe for forecasting (8 quarters ahead)
future <- make_future_dataframe(model, periods=8, freq="quarter")

# Make predictions
forecast <- predict(model, future)

# Display forecast components
head(forecast[, c("ds", "yhat", "yhat_lower", "yhat_upper")])
##           ds     yhat yhat_lower yhat_upper
## 1 1959-01-01 315.2778   314.8370   315.7617
## 2 1959-02-01 316.0343   315.5398   316.4986
## 3 1959-03-01 316.7482   316.2918   317.2268
## 4 1959-04-01 318.0628   317.6105   318.5223
## 5 1959-05-01 318.7045   318.2703   319.1653
## 6 1959-06-01 318.1195   317.6555   318.5570
# Visualize the forecast
plot(model, forecast)

# Plot the components of the forecast
prophet_plot_components(model, forecast)

4. Enhanced Prophet Models

4.1 Model with Adjusted Seasonality

Let’s explore how different seasonality settings affect the forecast:

# Model with multiplicative seasonality
model_mult <- prophet(co2_df, 
                     seasonality.mode = "multiplicative")
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
forecast_mult <- predict(model_mult, future)

# Compare forecasts
plot(model_mult, forecast_mult)

prophet_plot_components(model_mult, forecast_mult)

4.2 Model with Adjusted Changepoints

The trend flexibility can be adjusted by modifying the changepoint parameters:

# Model with more flexible trend
model_flexible <- prophet(co2_df, 
                         changepoint.prior.scale = 0.5)
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
forecast_flexible <- predict(model_flexible, future)

# Compare trends
plot(model_flexible, forecast_flexible)

4.3 Interactive Visualization

Let’s create an interactive visualization of our forecast:

library(plotly)

# Create interactive plot
plot_ly() %>%
  add_lines(x = ~co2_df$ds, y = ~co2_df$y, 
            name = "Historical", line = list(color = "black")) %>%
  add_lines(x = ~forecast$ds, y = ~forecast$yhat, 
            name = "Forecast", line = list(color = "blue")) %>%
  add_ribbons(x = ~forecast$ds, 
             ymin = ~forecast$yhat_lower, 
             ymax = ~forecast$yhat_upper,
             name = "Uncertainty", 
             fillcolor = "rgba(0, 0, 255, 0.2)") %>%
  layout(title = "CO2 Concentration Forecast",
         xaxis = list(title = "Date"),
         yaxis = list(title = "CO2 (ppm)"))

5. Model Comparison and Evaluation

Let’s compare the different models we’ve created:

# Extract and combine prediction results
results <- data.frame(
  Date = tail(forecast$ds, 8),
  Basic = tail(forecast$yhat, 8),
  Multiplicative = tail(forecast_mult$yhat, 8),
  Flexible = tail(forecast_flexible$yhat, 8)
)

# Display comparison table
knitr::kable(results, caption = "Forecast Comparison for Next 8 Quarters")
Forecast Comparison for Next 8 Quarters
Date Basic Multiplicative Flexible
1998-03-01 365.9377 366.0603 365.9533
1998-06-01 367.3121 367.5168 367.3280
1998-09-01 362.2602 362.0111 362.2773
1998-12-01 364.7259 364.6641 364.7419
1999-03-01 367.2811 367.4018 367.3031
1999-06-01 368.7574 368.9770 368.7780
1999-09-01 363.7058 363.4474 363.7284
1999-12-01 366.1395 366.0765 366.1605
# Calculate metrics for in-sample fit
# Use the last 12 data points as a test set
train_df <- head(co2_df, nrow(co2_df) - 12)
test_df <- tail(co2_df, 12)

# Train models on training data
model_train <- prophet(train_df)
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
future_train <- make_future_dataframe(model_train, periods=12, freq="month")
forecast_train <- predict(model_train, future_train)

# Calculate RMSE
forecast_values <- tail(forecast_train$yhat, 12)
actual_values <- test_df$y
rmse <- sqrt(mean((forecast_values - actual_values)^2))
cat("Root Mean Square Error (RMSE):", round(rmse, 3), "ppm\n")
## Root Mean Square Error (RMSE): 0.626 ppm
# Calculate MAPE
mape <- 100 * mean(abs((forecast_values - actual_values) / actual_values))
cat("Mean Absolute Percentage Error (MAPE):", round(mape, 3), "%\n")
## Mean Absolute Percentage Error (MAPE): 0.147 %

6. Discussion and Interpretation

Based on the analysis and forecasting results, we can draw several insights:

  1. Long-term Trend: The CO2 concentration shows a clear upward trend over time, increasing at an average rate of 0.37% per year.

  2. Seasonality: There is a distinct seasonal pattern in the CO2 data, with peaks typically occurring in March, April and May and troughs in October and November.

  3. Forecast Accuracy: The Prophet model achieved a RMSE of 0.626 ppm and a MAPE of 0.147%.

  4. Future Projections: Based on the forecast, CO2 levels are expected to reach 366 ppm by the year 2000, continuing the concerning upward trend.

7. Conclusion

In this project, I demonstrated the application of Meta’s Prophet forecasting system to analyze atmospheric CO2 concentration data. The Prophet model effectively captured both the long-term trend and seasonal patterns in the data, providing valuable insights and accurate forecasts.

The results highlight the continued increase in atmospheric CO2 levels, which has significant implications for climate change and environmental policy. The forecasting approach demonstrated here could be applied to other environmental time series to support decision-making and planning.

Prophet proved to be a powerful and flexible tool for time series forecasting, allowing for easy incorporation of seasonal patterns and trend changes with almost no manual configuration.

References

  • Taylor, S. J., & Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1), 37-45.
  • Meta Prophet documentation: https://facebook.github.io/prophet/
  • R Documentation for the CO2 dataset: ?co2